我们提出了DeepFusion,这是一种模块化的多模式结构,可在不同组合中以3D对象检测为融合激光雷达,相机和雷达。专门的功能提取器可以利用每种模式,并且可以轻松交换,从而使该方法变得简单而灵活。提取的特征被转化为鸟眼视图,作为融合的共同表示。在特征空间中融合方式之前,先进行空间和语义对齐。最后,检测头利用丰富的多模式特征,以改善3D检测性能。 LIDAR相机,激光摄像头雷达和摄像头融合的实验结果显示了我们融合方法的灵活性和有效性。在此过程中,我们研究了高达225米的遥远汽车检测的很大程度上未开发的任务,显示了激光摄像机融合的好处。此外,我们研究了3D对象检测的LIDAR点所需的密度,并在对不利天气条件的鲁棒性示例中说明了含义。此外,对我们的摄像头融合的消融研究突出了准确深度估计的重要性。
translated by 谷歌翻译
在线广告最近已发展成为一个竞争激烈且复杂的数十亿美元行业,广告商在大型和高频上竞标广告插槽。这导致对有效的“自动招标”算法的需求日益增长,这些算法确定了传入查询的投标,以最大程度地提高广告商的目标,但受其指定的约束。这项工作探讨了在日益流行的约束下,为单个价值最大化广告商提供有效的在线算法:返回式增长(ROS)。相对于最佳算法,我们对遗憾进行了量化效率,该算法知道所有查询所有查询都是先验的。我们贡献了一种简单的在线算法,该算法在期望中实现了近乎最佳的遗憾,同时始终尊重指定的ROS约束,当查询的输入顺序为i.i.d.来自某些分布的样本。我们还将结果与Balseiro,Lu和Mirrokni [BLM20]的先前工作相结合,以实现近乎最佳的遗憾,同时尊重ROS和固定的预算限制。我们的算法遵循原始的二重式框架,并使用在线镜像下降(OMD)进行双重更新。但是,我们需要使用非典型的OMD设置,因此需要使用OMD的经典低rebret保证,该保证是用于在线学习中的对抗性环境的,不再存在。尽管如此,在我们的情况下,在更普遍的情况下,在算法设计中应用低纤维动力学的情况下,OMD遇到的梯度可能远非对抗性,但受我们的算法选择的影响。我们利用这一关键见解来显示我们的OMD设置在我们的算法领域中造成了低落的遗憾。
translated by 谷歌翻译
主动学习是自动化机器学习系统的重要技术。与旨在自动化神经网络体系结构设计的神经体系结构搜索(NAS)相反,主动学习旨在自动化培训数据选择。对于训练长尾巴的任务尤其重要,在该任务中,在该任务中,稀疏的样品分布稀疏。主动学习通过逐步培训模型,以有效的数据选择来减轻昂贵的数据注释问题。它没有注释所有未标记的样本,而是迭代选择并注释最有价值的样本。主动学习在图像分类中很受欢迎,但在对象检测中尚未得到充分探索。当前的大多数对象检测方法都通过不同的设置进行评估,因此很难公平地比较其性能。为了促进该领域的研究,本文贡献了一个活跃的学习基准框架,称为Albench,用于评估对象检测中的主动学习。该Albench框架在自动深层模型训练系统上开发,易于使用,与不同的主动学习算法兼容,并确保使用相同的培训和测试协议。我们希望这种自动化的基准系统能够帮助研究人员轻松复制文学的表现,并与先前的艺术进行客观的比较。该代码将通过GitHub发布。
translated by 谷歌翻译
深度学习技术在图像压缩中显示出令人鼓舞的结果,并具有竞争性的比特率和图像重建质量。但是,尽管图像压缩已经朝着更高的峰值信噪比(PSNR)和每个像素(BPP)较少的位置发展,但它们对对抗图像的稳健性从未经过审议。在这项工作中,我们首次研究了图像压缩系统的鲁棒性,其中不可察觉的输入图像的扰动会导致其压缩潜在的比特率显着增加。为了表征最先进的图像压缩的鲁棒性,我们安装了白色框和黑框攻击。我们的白框攻击在比特斯流的熵估计中采用快速梯度标志方法作为比特率近似。我们提出了DCT-NET,以建筑简单性和轻量级训练为Black-Box攻击中的替代品,并实现快速的对抗性转移性,以模拟JPEG压缩。我们在六个图像压缩模型上的结果,每个模型具有六个不同的比特率质量(总共36个模型),表明它们令人惊讶地脆弱,其中白盒攻击可达到56.326X和Black-Box 1.947X BPP的变化。为了提高鲁棒性,我们提出了一种新型的压缩体系结构ractatn,它结合了注意模块和一个基本分解的熵模型,从而在对抗性攻击方面的速率延伸性能与鲁棒性之间的有希望的权衡,超过了现有的学术图像压缩机。
translated by 谷歌翻译
派生是一个重要而基本的计算机视觉任务,旨在消除在下雨天捕获的图像或视频中的雨条纹和累积。现有的派威方法通常会使雨水模型的启发式假设,这迫使它们采用复杂的优化或迭代细化以获得高回收质量。然而,这导致耗时的方法,并影响解决从假设偏离的雨水模式的有效性。在本文中,我们通过在没有复杂的雨水模型假设的情况下,通过在没有复杂的雨水模型假设的情况下制定污染作为预测滤波问题的简单而有效的污染方法。具体地,我们识别通过深网络自适应地预测适当的核的空间变型预测滤波(SPFILT以过滤不同的各个像素。由于滤波可以通过加速卷积来实现,因此我们的方法可以显着效率。我们进一步提出了eFderain +,其中包含三个主要贡献来解决残留的雨迹,多尺度和多样化的雨水模式而不会损害效率。首先,我们提出了不确定感知的级联预测滤波(UC-PFILT),其可以通过预测的内核来识别重建清洁像素的困难,并有效地移除残留的雨水迹线。其次,我们设计重量共享多尺度扩张过滤(WS-MS-DFILT),以处理多尺度雨条纹,而不会损害效率。第三,消除各种雨水模式的差距,我们提出了一种新颖的数据增强方法(即Rainmix)来培养我们的深层模型。通过对不同变体的复杂分析的所有贡献相结合,我们的最终方法在恢复质量和速度方面优于四个单像辐照数据集和一个视频派威数据集的基线方法。
translated by 谷歌翻译
由于复杂的骨骼年龄评估过程,在临床实践中,骨骼年龄评估具有挑战性。当前的自动骨龄年龄评估方法设计了很少考虑诊断物流,因此可能会产生某些无法解释的隐藏状态和输出。因此,医生很难与此类模型合作,因为很难检查模型预测的正确性。在这项工作中,我们提出了一个新的基于图的深度学习框架,用于使用手动X光片,称为Mimitator(DI)。 DI的结构旨在使用评分方法(例如Tanner-Whitehouse方法)来学习医生的诊断后勤,以进行骨骼年龄评估。具体而言,DI的卷积捕获了X光片上感兴趣的解剖区域(ROI)的局部特征,并通过我们提出的基于解剖学的组卷积预测了ROI评分,总结了骨骼年龄预测。此外,我们开发了一个新型的基于双图的注意模块,以计算ROI特征的患者特定注意力和ROI分数的上下文注意力。据我们所知,DI是遵循评分方法的第一个自动骨骼年龄评估框架,而没有完全监督的手部X光片。只有骨骼年龄监督的手动X光片上的实验证明DI可以通过稀疏参数实现出色的性能并提供更多的可解释性。
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
Supervised Deep-Learning (DL)-based reconstruction algorithms have shown state-of-the-art results for highly-undersampled dynamic Magnetic Resonance Imaging (MRI) reconstruction. However, the requirement of excessive high-quality ground-truth data hinders their applications due to the generalization problem. Recently, Implicit Neural Representation (INR) has appeared as a powerful DL-based tool for solving the inverse problem by characterizing the attributes of a signal as a continuous function of corresponding coordinates in an unsupervised manner. In this work, we proposed an INR-based method to improve dynamic MRI reconstruction from highly undersampled k-space data, which only takes spatiotemporal coordinates as inputs. Specifically, the proposed INR represents the dynamic MRI images as an implicit function and encodes them into neural networks. The weights of the network are learned from sparsely-acquired (k, t)-space data itself only, without external training datasets or prior images. Benefiting from the strong implicit continuity regularization of INR together with explicit regularization for low-rankness and sparsity, our proposed method outperforms the compared scan-specific methods at various acceleration factors. E.g., experiments on retrospective cardiac cine datasets show an improvement of 5.5 ~ 7.1 dB in PSNR for extremely high accelerations (up to 41.6-fold). The high-quality and inner continuity of the images provided by INR has great potential to further improve the spatiotemporal resolution of dynamic MRI, without the need of any training data.
translated by 谷歌翻译
Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.
translated by 谷歌翻译